Back to Article
EXPECT Copper AEP
Download Source

EXPECT Copper AEP

Author

Sam Welch

Published

November 25, 2025

In [1]:
Code
library(STOPeData)
library(DT)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Code
library(tidyr)
library(ggplot2)
library(forcats)
library(viridis)
Loading required package: viridisLite
Code
library(stringr)
library(ggridges)
library(ggh4x)
library(tidygraph)

Attaching package: 'tidygraph'
The following object is masked from 'package:stats':

    filter
Code
library(ggraph)
library(ggiraph)
library(leaflet)
library(sf)
Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
Code
library(shiny)

Attaching package: 'shiny'
The following objects are masked from 'package:DT':

    dataTableOutput, renderDataTable
Code
library(dplyr)
library(bslib)

Attaching package: 'bslib'
The following object is masked from 'package:utils':

    page
Code
library(ggtext)
library(data.table)

Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':

    between, first, last
Code
library(eDataDRF)

Attaching package: 'eDataDRF'
The following objects are masked from 'package:STOPeData':

    abbreviate_string, altitude_units_vocabulary,
    analytical_protocols_vocabulary, areas_vocabulary,
    coordinate_systems_vocabulary, countries_vocabulary,
    dummy_parameters_vocabulary, environ_compartments_sub_vocabulary,
    environ_compartments_vocabulary, extraction_protocols_vocabulary,
    fractionation_protocols_vocabulary, gender_vocabulary,
    generate_protocol_id, geographic_features_sub_vocabulary,
    geographic_features_vocabulary, get_dataset_display_name,
    initialise_biota_tibble, initialise_campaign_tibble,
    initialise_compartments_tibble, initialise_CREED_data_tibble,
    initialise_CREED_scores_tibble, initialise_measurements_tibble,
    initialise_methods_tibble, initialise_parameters_tibble,
    initialise_references_tibble, initialise_samples_tibble,
    initialise_sites_tibble, lifestage_vocabulary,
    measured_categories_vocabulary, measured_flags_vocabulary,
    measured_types_vocabulary, parameter_types_sub_vocabulary,
    parameter_types_vocabulary, parameter_unit_vocabulary,
    protocol_categories_vocabulary, protocol_options_vocabulary,
    reference_character_limits, sampling_protocols_vocabulary,
    species_groups_vocabulary, species_names_vocabulary,
    tissue_types_vocabulary, uncertainty_types_vocabulary
Code
source("_targets.R")
here() starts at C:/Users/SAW/Local Documents/EXPECT AEP/EXPECT_AEP_R_Project
here() starts at C:/Users/SAW/Local Documents/EXPECT AEP/EXPECT_AEP_R_Project
ℹ Loading EXPECT_AEP_R_Project
Code
tar_make()
here() starts at C:/Users/SAW/Local Documents/EXPECT AEP/EXPECT_AEP_R_Project
here() starts at C:/Users/SAW/Local Documents/EXPECT AEP/EXPECT_AEP_R_Project
ℹ Loading EXPECT_AEP_R_Project
_targets.R here() resolves to: C:/Users/SAW/Local Documents/EXPECT AEP/EXPECT_AEP_R_Project 
_targets.R wd: C:/Users/SAW/Local Documents/EXPECT AEP/EXPECT_AEP_R_Project 
✔ skipped pipeline [1.1s, 28 skipped]
In [2]:
Code
# load in any data we need from the targets workflow
literature_merged_data <- tar_read(load_literature_pqt)
literature_reference_data <- tar_read(reference_data)
literature_campaign_data <- tar_read(campaign_data)
literature_sites_data <- tar_read(sites_data)
literature_qc <- tar_read(data_quality_report)
wgs84_geo <- tar_read(wgs84_geography)

species_names <- eDataDRF::species_names_vocabulary()
species_lookup <- species_names |>
  with(setNames(SPECIES_COMMON_NAME, SPECIES_NAME))
In [3]:
Code

copper_toxicity_thresholds <- tar_read(copper_toxicity_thresholds)
# Define threshold colors based on the report ----
threshold_colours <- c(
  "Background (I)" = "#0070C0", # Blue
  "Good (II)" = "#00B050", # Green
  "Moderate (III)" = "#FFC000", # Yellow
  "Poor (IV)" = "#FF6600", # Orange
  "Very Poor (V)" = "#FF0000" # Red
)

compartment_colours <- c(
  # Aquatic ----
  "Freshwater" = "#4A90E2", # clear blue
  "Marine/Salt Water" = "#1E5A8E", # deep ocean blue
  "Brackish/Transitional Water" = "#6BA5C8", # mid-blue (between fresh/marine)
  "Groundwater" = "#2C5F7B", # darker, subdued blue
  "Wastewater" = "#8B7355", # murky brown
  "Liquid Growth Medium" = "#9FD8CB", # pale greenish-blue
  "Rainwater" = "#B3D9FF", # light sky blue
  "Stormwater" = "#607D8B", # grey-blue
  "Leachate" = "#6B5344", # dark muddy brown
  "Aquatic Sediment" = "#EBCF1B", # your existing yellow
  "Porewater" = "#7FA0B8", # muted blue-grey
  "Sludge" = "#67AB33", # your existing green

  # Atmospheric ----
  "Indoor Air" = "#E8F4F8", # very pale blue-grey
  "Outdoor Air" = "#87CEEB", # sky blue

  # Terrestrial ----
  "Terrestrial Biological Residue" = "#8B6F47", # organic brown
  "Soil H Horizon (Peat)" = "#3E2723", # very dark brown
  "Soil O Horizon (Organic)" = "#5D4037", # dark organic brown
  "Soil A Horizon (Topsoil)" = "#795548", # medium brown
  "Soil E Horizon (Mineral)" = "#A1887F", # light greyish-brown
  "Soil S Horizon (Mineral)" = "#BCAAA4", # pale brown-grey
  "Soil C Horizon (Parent Material)" = "#D7CCC8", # very pale brown
  "Soil R Horizon (Bedrock)" = "#9E9E9E", # grey

  # Biota ----
  "Biota, Terrestrial" = "#558B2F", # forest green
  "Biota, Aquatic" = "#00695C", # teal
  "Biota, Atmospheric" = "#B39DDB", # light purple (birds/insects)
  "Biota, Other" = "#9C27B0" # distinct purple
)

# Helper function for sample sizes
calculate_sample_sizes <- function(data) {
  data |>
    group_by(
      REFERENCE_ID,
      OCEAN_COUNTRY,
      SITE_GEOGRAPHIC_FEATURE,
      ENVIRON_COMPARTMENT_SUB
    ) |>
    summarise(total_n = sum(MEASURED_N, na.rm = TRUE), .groups = "drop")
}

# Create sub-compartment letter codes ----
subcomp_codes <- c(
  # Aquatic
  "Freshwater" = "F",
  "Marine/Salt Water" = "M",
  "Brackish/Transitional Water" = "B",
  "Groundwater" = "G",
  "Wastewater" = "WW",
  "Liquid Growth Medium" = "LGM",
  "Rainwater" = "R",
  "Stormwater" = "SW",
  "Leachate" = "L",
  "Aquatic Sediment" = "AS",
  "Porewater" = "P",
  "Sludge" = "Sl",
  # Atmospheric
  "Indoor Air" = "IA",
  "Outdoor Air" = "OA",
  # Terrestrial
  "Terrestrial Biological Residue" = "TBR",
  "Soil H Horizon (Peat)" = "H",
  "Soil O Horizon (Organic)" = "O",
  "Soil A Horizon (Topsoil)" = "A",
  "Soil E Horizon (Mineral)" = "E",
  "Soil S Horizon (Mineral)" = "S",
  "Soil C Horizon (Parent Material)" = "C",
  "Soil R Horizon (Bedrock)" = "R",
  # Biota
  "Biota, Terrestrial" = "BT",
  "Biota, Aquatic" = "BA",
  "Biota, Atmospheric" = "BAm",
  "Biota, Other" = "BO"
)

Dataset Overview

To Do

Data Quality Overview

In [4]:
Code

#' Generate a text summary of data quality issues
#'
#' @param qc_report Output from check_data_quality()
#' @return A character string with markdown-formatted summary
format_quality_summary <- function(qc_report) {
  s <- qc_report$summary

  glue::glue(
    "
Dataset contains **{s$total_rows}** rows from **{s$total_references}** references across **{s$total_sites}** sites.

**Issues identified:**

- **Measurements**: {s$n_rows_missing_measurements} rows ({s$n_refs_missing_measurements} references) missing all measurement data (value, LOQ, and LOD)
- **Sample size**: {s$n_rows_missing_n} rows ({s$n_refs_missing_n} references) missing or zero sample size
- **Methods**: {s$n_rows_missing_methods} rows ({s$n_refs_missing_methods} references) missing method information
- **Uncertainty**: {s$n_rows_missing_uncertainty} rows ({s$n_refs_missing_uncertainty} references) missing uncertainty type
- **Site data**: {s$n_sites_missing_data} sites ({s$n_refs_missing_sites} references) missing location/geographic data
- **Biota data**: {s$n_rows_missing_biota} rows ({s$n_refs_missing_biota} references) missing species/tissue information
"
  )
}

cat(format_quality_summary(literature_qc))
Dataset contains **48972** rows from **32** references across **5142** sites.

**Issues identified:**

- **Measurements**: 52 rows (2 references) missing all measurement data (value, LOQ, and LOD)
- **Sample size**: 49 rows (1 references) missing or zero sample size
- **Methods**: 49 rows (1 references) missing method information
- **Uncertainty**: 48842 rows (18 references) missing uncertainty type
- **Site data**: 4 sites (2 references) missing location/geographic data
- **Biota data**: 1 rows (1 references) missing species/tissue information

Missing Measurements

Rows where measured value, LOQ, and LOD are all missing.

Known issues:

  • 2025PelkonenEnvironmentalImpactOf does not report LOD/LOQ values in main paper or ESI.
In [5]:
Code

if (nrow(literature_qc$missing_measurements) > 0) {
  DT::datatable(
    literature_qc$missing_measurements |>
      dplyr::select(-sample_ids),
    options = list(pageLength = 10),
    caption = "References with missing measurement data"
  )
} else {
  cat("*No missing measurements found.*")
}

Missing Sample Size

Rows where MEASURED_N is missing or zero.

In [6]:
Code

if (nrow(literature_qc$missing_n) > 0) {
  DT::datatable(
    literature_qc$missing_n |>
      dplyr::select(-sample_ids),
    options = list(pageLength = 10),
    caption = "References with missing sample size"
  )
} else {
  cat("*No missing sample sizes found.*")
}

Missing Methods

Rows missing any of: analytical protocol, extraction protocol, fractionation protocol, or sampling protocol.

In [7]:
Code


if (nrow(literature_qc$missing_methods) > 0) {
  display_df <- literature_qc$missing_methods |>
    dplyr::mutate(
      analytical_issue = dplyr::case_when(
        missing_analytical ~ "Missing",
        .default = NA_character_
      ),
      extraction_issue = dplyr::case_when(
        missing_extraction ~ "Missing",
        .default = NA_character_
      ),
      fractionation_issue = dplyr::case_when(
        missing_fractionation ~ "Missing",
        .default = NA_character_
      ),
      sampling_issue = dplyr::case_when(
        missing_sampling ~ "Missing",
        .default = NA_character_
      )
    ) |>
    dplyr::select(
      REFERENCE_ID,
      n_rows,
      analytical_issue,
      extraction_issue,
      fractionation_issue,
      sampling_issue
    )

  DT::datatable(
    display_df,
    options = list(pageLength = 10),
    caption = "References with missing method information"
  )
} else {
  cat("*No missing methods found.*")
}

Missing Uncertainty

Rows where UNCERTAINTY_TYPE or UNCERTAINTY_*_STANDARD are missing.

In [8]:
Code

if (nrow(literature_qc$missing_uncertainty) > 0) {
  display_df <- literature_qc$missing_uncertainty |>
    dplyr::mutate(
      uncertainty_type_issue = dplyr::case_when(
        type_missing ~ "Missing",
        type_not_reported ~ "Not reported",
        type_not_relevant ~ "Not relevant",
        .default = NA_character_
      ),
      upper_bound_issue = dplyr::case_when(
        upper_missing ~ "Missing",
        upper_zero ~ "Zero",
        upper_below_value ~ "< Average",
        .default = NA_character_
      ),
      lower_bound_issue = dplyr::case_when(
        lower_missing ~ "Missing",
        lower_zero ~ "Zero",
        lower_above_value ~ "> Average",
        .default = NA_character_
      )
    ) |>
    dplyr::select(
      REFERENCE_ID,
      n_rows,
      uncertainty_type_issue,
      upper_bound_issue,
      lower_bound_issue
    )

  DT::datatable(
    display_df,
    options = list(pageLength = 10),
    caption = "References with missing or problematic uncertainty"
  )
} else {
  cat("*No uncertainty issues found.*")
}

Missing Site Data

Sites missing coordinates, geographic features, country, or ocean information.

In [9]:
Code

if (nrow(literature_qc$missing_sites) > 0) {
  DT::datatable(
    literature_qc$missing_sites,
    options = list(pageLength = 10),
    caption = "Sites with missing geographic data"
  )
} else {
  cat("*No missing site data found.*")
}

Missing Biota Data

Biota samples missing species, tissue, lifestage, gender, or species group.

In [10]:
Code

if (nrow(literature_qc$missing_biota) > 0) {
  DT::datatable(
    literature_qc$missing_biota |>
      dplyr::select(-sample_ids),
    options = list(pageLength = 10),
    caption = "References with missing biota information"
  )
} else {
  cat("*No missing biota data found.*")
}

Literature Summary

@tbl-paper-summaries provides a comprehensive summary of all papers included in this analysis, showing the number of measurements extracted from each study and their associated sampling campaigns.

In [11]:
Code
literature_merged_data |>
  dplyr::left_join(literature_reference_data) |>
  dplyr::left_join(
    literature_campaign_data |> select(CAMPAIGN_NAME_SHORT, CAMPAIGN_COMMENT)
  ) |>
  dplyr::group_by(
    TITLE,
    YEAR,
    AUTHOR,
    CAMPAIGN_COMMENT
  ) |>
  dplyr::reframe(SUM_N = sum(MEASURED_N)) |>
  datatable(options = list(pageLength = 5, lengthMenu = c(5, 10, 15, 20)))
Joining with `by = join_by(REFERENCE_ID, YEAR, TITLE, DATA_SOURCE)`
Joining with `by = join_by(CAMPAIGN_NAME_SHORT, CAMPAIGN_COMMENT)`
In [12]:
Summary of analysed papers showing publication details, associated campaigns, and number of copper measurements extracted from each study.

Compartment Coverage

In [13]:
Code

# Prepare data for plotting ----
plot_data <- literature_merged_data |>
  group_by(REFERENCE_ID, ENVIRON_COMPARTMENT_SUB) |>
  distinct() |>
  reframe(count = sum(MEASURED_N, na.rm = TRUE)) |>
  # Join year data for y-axis ordering
  left_join(
    literature_merged_data |> distinct(REFERENCE_ID, YEAR),
    by = "REFERENCE_ID"
  ) |>
  filter(!is.na(ENVIRON_COMPARTMENT_SUB))

# Calculate text color threshold ----
threshold_value <- mean(plot_data$count, na.rm = TRUE)

# Create plot ----
ggplot(
  plot_data,
  aes(
    y = REFERENCE_ID,
    x = ENVIRON_COMPARTMENT_SUB,
    fill = count
  )
) +
  geom_tile() +
  geom_text(aes(
    label = count,
    color = if_else(count > threshold_value, "black", "white")
  )) +
  scale_fill_viridis(name = "Sample Size (n)") +
  scale_color_identity() +
  theme_aep_plots() +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1),
    axis.title.y = element_blank(),
    legend.position = "none"
  ) +
  labs(x = "Environmental Sub-Compartment")
Number of environmental sub-compartments sampled per paper. Each bar represents a unique publication, with height indicating the diversity of compartments (e.g., surface water, sediment, soil) analysed within that study.

Species Coverage

@fig-paper-species illustrates the taxonomic diversity captured across studies, highlighting which papers focused on biota measurements and the range of species groups investigated.

In [14]:
Code

# add species common names missing from ECOTOX data
new_species <- c(
  "Phoca vitulina" = "Harbor Seal",
  "Calanus finmarchicus" = "Copepod",
  "Calanus glacialis" = "Arctic Copepod",
  "Calanus hyperboreus" = "Copepod",
  "Conchoecia borealis" = "Ostracod",
  "Euchaeta barbata" = "Copepod",
  "Euchaeta glacialis" = "Arctic Copepod",
  "Euchaeta norvegica" = "Copepod",
  "Eukrohnia hamata" = "Arrow Worm",
  "Hymenodora glacialis" = "Arctic Shrimp",
  "Meganyctiphanes norvegica" = "Northern Krill",
  "Metridia longa" = "Copepod",
  "Themisto abyssorum" = "Amphipod",
  "Themisto libellula" = "Amphipod",
  "Thysanoessa inermis" = "Arctic Krill",
  "Astarte borealis" = "Northern Astarte",
  "Mya arenaria" = "Soft-shell Clam",
  "Cystophora cristata" = "Hooded Seal",
  "Pagophilus groenlandicus" = "Harp Seal",
  "Salmo trutta" = "Brown Trout",
  "Arenicola marina" = "Lugworm",
  "Gammarus oceanicus" = "Amphipod",
  "Littorina rudis" = "Rough Periwinkle",
  "Mytilus edulis" = "Blue Mussel",
  "Nucella lapillus" = "Dog Whelk",
  "Phoca groenlandica" = "Harp Seal",
  "Ursus maritimus" = "Polar Bear",
  "Rissa tridactyla" = "Black-legged Kittiwake",
  "Alitta virens" = "King Ragworm",
  "Reinhardtius hippoglossoides" = "Greenland Halibut"
)

# Extend species_lookup
species_lookup <- c(species_lookup, new_species)

# Prepare data for plotting ----
plot_data <- literature_merged_data |>
  group_by(REFERENCE_ID, SAMPLE_SPECIES) |>
  distinct() |>
  reframe(count = sum(MEASURED_N, na.rm = TRUE), SPECIES_GROUP) |>
  # Join year data for y-axis ordering
  left_join(
    literature_merged_data |> distinct(REFERENCE_ID, YEAR),
    by = "REFERENCE_ID"
  ) |>
  filter(!is.na(SAMPLE_SPECIES)) |>
  distinct()

# Create formatted labels
plot_data <- plot_data |>
  mutate(
    formatted_species = glue::glue(
      "**{species_lookup[SAMPLE_SPECIES]}**<br>*({SAMPLE_SPECIES})*"
    )
  )


# Calculate text color threshold ----
threshold_value <- 2000 # set manually, as the 2050 mussels tend to overwhelm the others

# Create plot ----

ggplot(
  plot_data,
  aes(
    y = REFERENCE_ID, # Use the formatted label
    x = formatted_species,
    fill = count
  )
) +
  geom_tile() +
  geom_text(aes(
    label = count,
    color = if_else(count > threshold_value, "black", "white")
  )) +
  scale_fill_viridis(name = "Sample Size (n)") +
  scale_color_identity() +

  theme_aep_plots() + # breaks markdown
  theme(
    # we have to specify axis.text.x.bottom or it doesn't work with theme_minimal()
    axis.text.x.bottom = element_markdown(angle = 90, hjust = 1, vjust = 0.5), # Enable markdown rendering
    axis.title.y = element_blank(),
    legend.position = "none"
  ) +
  labs(x = "Sampled Species") +

  facet_wrap(
    facets = vars(SPECIES_GROUP),
    space = "free_x",
    scales = "free_x",
    nrow = 1
  )
Sample size per species per paper across the dataset.

Geographic Distribution of Sampling Sites

In [15]:
Code

# Turn off spherical geometry ----
sf::sf_use_s2(FALSE)
Spherical geometry (s2) switched off
Code

# Prepare spatial data ----
data_sf <- literature_merged_data |>
  filter(
    !is.na(LATITUDE),
    !is.na(LONGITUDE),
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  st_as_sf(
    coords = c("LONGITUDE", "LATITUDE"),
    crs = st_crs("WGS84"),
    remove = FALSE)

  # Prepare apple data ----
  number_of_sites <- nrow(municipality_centroids())
  sales_data_split <- copper_oxide_sales() |> mutate(amount_kg = amount_kg / number_of_sites,
  average_kg = average_kg / number_of_sites)

  apple_data <- cross_join(municipality_centroids(), sales_data_split) |>
  group_by(name, x, y) |>
  summarise(
    avg_copper_kg = mean(amount_kg, na.rm = TRUE),
    .groups = "drop"
  ) |>
  st_as_sf(
    coords = c("x", "y"),
    crs = st_crs("WGS84"),
    remove = FALSE
  )

environ_compartments_vocabulary <- environ_compartments_vocabulary() |>
  setdiff(c("Not relevant", "Not reported", "Other"))

# Create colour palette for main compartments ----
compartment_pal <- colorFactor(
  palette = palette.colors(n = 4, palette = "Tableau"),
  domain = environ_compartments_vocabulary
)

# Create colour palette for measured values ----
value_range <- range(data_sf$MEASURED_VALUE_STANDARD, na.rm = TRUE)
value_pal <- colorNumeric(
  palette = "viridis",
  domain = value_range,
  na.color = "gray"
)

# Add codes to data ----
data_sf <- data_sf |>
  mutate(
    SUBCOMP_CODE = subcomp_codes[ENVIRON_COMPARTMENT_SUB],
    SUBCOMP_CODE = ifelse(is.na(SUBCOMP_CODE), "?", SUBCOMP_CODE)
  )

# Define map colours ----
marine_colors <- list(
  default = "lightblue",
  highlight = "darkblue"
)

country_colours <- list(
  default = "lightgreen",
  highlight = "darkgreen"
)

# Calculate polygon centroids for labels ----
marine_centroids <- wgs84_geo$marine_polys |>
  filter(highlight_name) |>
  st_centroid()
Warning: st_centroid assumes attributes are constant over geometries
Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
of_largest_polygon): st_centroid does not give correct centroids for
longitude/latitude data
Code

country_centroids <- wgs84_geo$countries |>
  filter(
    highlight_name,
    !name %in% c("Greenland", "Iceland", "Norway")
  ) |>
  st_centroid()
Warning: st_centroid assumes attributes are constant over geometries
Warning: st_centroid does not give correct centroids for longitude/latitude
data
Code

# Create study connection lines ----
study_lines <- data_sf |>
  st_drop_geometry() |>
  group_by(REFERENCE_ID) |>
  filter(n() > 1, CAMPAIGN_NAME_SHORT != "Vm_2025") |> # simply too many sites in Vm_2025 for polylines to be practical
  summarise(
    coords = list(cbind(LONGITUDE, LATITUDE)),
    .groups = "drop"
  )

# Convert to linestring sf object ----
study_lines_sf <- study_lines |>
  rowwise() |>
  mutate(
    geometry = list(st_linestring(coords))
  ) |>
  ungroup() |>
  select(-coords) |>
  st_sf(crs = st_crs("WGS84"))

@fig-sampling-map shows the spatial distribution of all sampling locations included in the analysis. This map helps visualise the geographic scope of copper monitoring in Arctic regions and identify areas with high or low sampling intensity.

In [16]:
Code

# Create map ----
map <- leaflet() |>
  addProviderTiles(providers$CartoDB.Positron) |>

  # Add ocean polygons (no popups) ----
  addPolygons(
    data = wgs84_geo$marine_polys |> filter(highlight_name),
    fillColor = ~ ifelse(
      highlight_name == TRUE,
      marine_colors[["highlight"]],
      marine_colors[["default"]]
    ),
    fillOpacity = ~ ifelse(highlight_name == TRUE, 0.3, 0),
    color = "black",
    weight = 1,
    group = "Geography"
  ) |>

  # Add country polygons (no popups) ----
  addPolygons(
    data = wgs84_geo$countries |> filter(highlight_name),
    fillColor = ~ ifelse(
      highlight_name == TRUE,
      country_colours[["highlight"]],
      country_colours[["default"]]
    ),
    fillOpacity = ~ ifelse(highlight_name == TRUE, 0.3, 0),
    weight = 1,
    color = "black",
    group = "Geography"
  ) |>

  # Add permanent labels for oceans ----
  addLabelOnlyMarkers(
    data = marine_centroids,
    label = ~name,
    labelOptions = labelOptions(
      noHide = TRUE,
      direction = "center",
      textOnly = TRUE,
      style = list(
        "color" = "darkblue",
        "font-weight" = "bold",
        "font-size" = "14px",
        "text-shadow" = "1px 1px 2px white, -1px -1px 2px white"
      )
    ),
    group = "Geography"
  ) |>

  # Add permanent labels for countries ----
  addLabelOnlyMarkers(
    data = country_centroids,
    label = ~name,
    labelOptions = labelOptions(
      noHide = TRUE,
      direction = "center",
      textOnly = TRUE,
      style = list(
        "color" = "darkgreen",
        "font-weight" = "bold",
        "font-size" = "12px",
        "text-shadow" = "1px 1px 2px white, -1px -1px 2px white"
      )
    ),
    group = "Geography"
  ) |>

  # Add study connection lines ----
  addPolylines(
    data = study_lines_sf,
    color = "purple",
    weight = 2,
    opacity = 0.5,
    dashArray = "5, 5",
    label = ~ paste0("Study: ", REFERENCE_ID),
    labelOptions = labelOptions(
      style = list("font-weight" = "normal")
    ),
    group = "Study Connections"
  ) |>

  # Add mine tailings markers ----
  addMarkers(
    data = mine_tailings_coords(),
    lng = ~LONGITUDE,
    lat = ~LATITUDE,
    icon = list(
      iconUrl = "https://upload.wikimedia.org/wikipedia/commons/8/8c/Mining_symbol.svg",
      iconSize = c(15, 15)
    ),
    popup = ~ paste0(
      "<b>Mine:</b> ",
      MINE_NAME,
      "<br>",
      "<b>Location:</b> ",
      SITE_NAME,
      "<br>",
      "<b>Ore Type:</b> ",
      ORE_TYPE,
      "<br>",
      "<b>Status:</b> ",
      ifelse(ACTIVE_2013, "Active (2013)", "Terminated"),
      "<br>",
      "<b>Details:</b> ",
      KEY_DETAILS,
      "<br>",
      "<b>Coordinate Confidence:</b> ",
      CONFIDENCE,
      "/5"
    ),
    group = "Mine Tailings"
  ) |>

  # apple markers... for apples
  addMarkers(
    data = apple_data,
    lng = ~x,
    lat = ~y,
    icon = list(
      iconUrl = "https://www.svgrepo.com/show/42619/apple-fruit.svg",
      iconSize = c(15, 15)
    ),
    popup = ~ paste0(
      "<b>Municipality:</b> ",
      name,
      "<br>",
      "<b>Average Copper Applied:</b> ",
      round(avg_copper_kg, 1),
      " kg/year",
      "<br>",
      "<b>Period:</b> 2020-2024"
    ),
    group = "Apple Production"
  ) |>

  # add case study sites
    addCircles(
      data = expect_case_studies(),
    lng = ~lon,
    lat = ~lat,
    radius = 100000,  # 100km radius
    color = "hotpink",
    fillColor = "hotpink",
    fillOpacity = 0.3,
    opacity = 0.6,
    weight = 2,
    popup = ~paste0(
      "<strong>", site_name, "</strong><br/>",
      "<em>Lat: ", lat, ", Lon: ", lon, "</em><br/><br/>",
      summary
    ),
    group =       "Potential Case Studies"

  )

# Add layers for each main compartment ----
for (comp in environ_compartments_vocabulary) {
  data_subset <- data_sf |>
    filter(ENVIRON_COMPARTMENT == comp)

  if (nrow(data_subset) > 0) {
    # Build popup HTML with conditional species info
    popup_html <- if (comp == "Biota") {
      ~ paste0(
        "<b>Reference:</b> ",
        REFERENCE_ID,
        "<br>",
        "<b>Site:</b> ",
        SITE_CODE,
        "<br>",
        "<b>Date:</b> ",
        SAMPLING_DATE,
        "<br>",
        "<b>Compartment:</b> ",
        ENVIRON_COMPARTMENT,
        "<br>",
        "<b>Sub-compartment:</b> ",
        ENVIRON_COMPARTMENT_SUB,
        " (",
        SUBCOMP_CODE,
        ")",
        "<br>",
        "<b>Species:</b> ",
        SAMPLE_SPECIES,
        "<br>",
        "<b>Measured Value:</b> ",
        round(MEASURED_VALUE_STANDARD, 3),
        " ",
        MEASURED_UNIT_STANDARD
      )
    } else {
      ~ paste0(
        "<b>Reference:</b> ",
        REFERENCE_ID,
        "<br>",
        "<b>Site:</b> ",
        SITE_CODE,
        "<br>",
        "<b>Date:</b> ",
        SAMPLING_DATE,
        "<br>",
        "<b>Compartment:</b> ",
        ENVIRON_COMPARTMENT,
        "<br>",
        "<b>Sub-compartment:</b> ",
        ENVIRON_COMPARTMENT_SUB,
        " (",
        SUBCOMP_CODE,
        ")",
        "<br>",
        "<b>Measured Value:</b> ",
        round(MEASURED_VALUE_STANDARD, 3),
        " ",
        MEASURED_UNIT_STANDARD
      )
    }

    map <- map |>
      addCircleMarkers(
        data = data_subset,
        fillColor = ~ value_pal(MEASURED_VALUE_STANDARD),
        color = compartment_pal(comp),
        weight = 2,
        fillOpacity = 0.7,
        radius = 6,
        label = ~SUBCOMP_CODE,
        labelOptions = labelOptions(
          noHide = TRUE,
          direction = "center",
          textOnly = TRUE,
          style = list(
            "color" = compartment_pal(comp),
            "font-weight" = "normal",
            "font-size" = "8px"
          )
        ),
        popup = popup_html,
        group = "Samples"
      )
  }
}

# Add site labels as separate toggleable layer ----
map <- map |>
  addLabelOnlyMarkers(
    data = data_sf,
    label = ~SITE_CODE,
    labelOptions = labelOptions(
      noHide = FALSE,
      direction = "top",
      textOnly = TRUE,
      style = list(
        "color" = "black",
        "font-size" = "9px",
        "background" = "rgba(255, 255, 255, 0.7)",
        "padding" = "2px"
      )
    ),
    group = "Site Labels"
  ) |>

  # Add layer controls ----
  addLayersControl(
    baseGroups = character(0),
    overlayGroups = c(
      "Geography",
      "Study Connections",
      "Mine Tailings",
      "Apple Production",
      "Samples",
      "Site Labels",
      "Potential Case Studies"
    ),
    options = layersControlOptions(collapsed = FALSE)
  ) |>

  # Add legends ----
  addLegend(
    position = "bottomright",
    pal = compartment_pal,
    values = environ_compartments_vocabulary,
    title = "Compartment<br>(border colour)",
    opacity = 1
  ) |>
  addLegend(
    position = "bottomleft",
    pal = value_pal,
    values = data_sf$MEASURED_VALUE_STANDARD,
    title = "Measured Value<br>(fill colour)",
    opacity = 0.8,
    labFormat = labelFormat(digits = 1)
  )

# Display map ----
bslib::card(full_screen = TRUE, map, style = "text-align: left; z-index: 999999;") # yes, of course, that makes perfect sense

Copper Concentrations by Environmental Compartment

Get a basic overview of the orders of magnitude for different sub-compartments (@fig-basic-subcompartments), so we can plot them on scales that make sense. Note that I haven’t included uncertainty or sample sizes.

In [17]:
Code

literature_merged_data |>
  filter(!is.na(MEASURED_UNIT_STANDARD)) |>
  ggplot(aes(x = MEASURED_VALUE_STANDARD, y = ENVIRON_COMPARTMENT_SUB, color = ENVIRON_COMPARTMENT_SUB)) +
  geom_point() +
  facet_nested(rows = vars(ENVIRON_COMPARTMENT, MEASURED_UNIT_STANDARD),
  scales = "free_y",
  space = "free_y",
  nest_line = element_line()
  ) +
  theme_minimal() +
    scale_color_manual(
    values = compartment_colours,
    breaks = names(compartment_colours)
  ) +
    theme(legend.position = "none") +
    labs(x = "", y = "")
Warning: Removed 20 rows containing missing values or values outside the scale range
(`geom_point()`).

The following sections present copper concentration distributions for each major environmental compartment. Plots are based on reported means uncertainty bounds (SD, range, 95% CI, etc.) and overlaid with relevant regulatory thresholds where available.

Aquatic Compartment

Freshwater

@fig-boxplot-freshwater shows we have a fair amount of data for water. It’s not strictly useful to include saltwater, freshwater and inbetween on here, so I think it makes sense to split it - but it won’t make a massive difference to the final product.

Some of these units raise some big questions. Why is brackish/transitional using dry? And why are there some aquatic biota using mg/L?

In [18]:
Code
# Calculate sample sizes
sample_sizes_water <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB == "Freshwater",
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  calculate_sample_sizes()

# Get first facet for labels
first_facet <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB == "Freshwater",
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  distinct(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE) |>
  slice(1)

# Create plot
literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB == "Freshwater",
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  ggplot(aes(
    x = MEASURED_OR_IMPUTED_VALUE_STANDARD,
    y = REFERENCE_ID,
    colour = ENVIRON_COMPARTMENT_SUB
  )) +
  # ... (all your other geoms remain the same) ...
  geom_crossbar(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "95% Confidence Interval"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    )
  ) +
  geom_linerange(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    linewidth = 0.8
  ) +
  geom_point(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    size = 2
  ) +
  geom_boxplot(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "1st-3rd Quartile"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      lower = UNCERTAINTY_LOWER_STANDARD,
      middle = MEASURED_OR_IMPUTED_VALUE_STANDARD,
      upper = UNCERTAINTY_UPPER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    stat = "identity"
  ) +
  geom_point(
    alpha = 0.5,
    data = \(x) {
      filter(
        x,
        is_not_reported(UNCERTAINTY_TYPE) | is_not_relevant(UNCERTAINTY_TYPE)
      )
    }
  ) +
  geom_text(
    data = sample_sizes_water,
    aes(x = -Inf, y = REFERENCE_ID, label = paste0("n=", total_n)),
    hjust = -1, vjust = -1,
    inherit.aes = FALSE,
    size = 3
  ) +
  geom_vline(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps, THRESHOLD_TYPE == "Classification boundary")
    },
    aes(xintercept = THRESHOLD_VALUE, color = THRESHOLD_CLASS),
    linetype = "dashed",
    linewidth = 0.8,
    alpha = 0.6
  ) +
  # Threshold labels - only in first facet
  geom_text(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps) |>
        cross_join(first_facet)
    },
    aes(x = THRESHOLD_VALUE, y = Inf, label = THRESHOLD_CLASS, color = THRESHOLD_CLASS),
    vjust = 1.5,
    angle = 90,
    hjust = 1,
    size = 3,
    inherit.aes = FALSE
  ) +
  scale_color_manual(
    values = c(threshold_colours, compartment_colours),
    breaks = names(compartment_colours)
  ) +
  labs(
    x = "Concentration (mg/L)",
    y = "Reference",
    colour = "Sub-compartment"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.box = "vertical"
  ) +
  facet_nested(
    rows = vars(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE),
    scales = "free_y",
    space = "free_y",
    shrink = TRUE,
    labeller = label_wrap_gen(width = 15),
    nest_line = element_line()
  ) +
      coord_cartesian(
    xlim = range(
      literature_merged_data$UNCERTAINTY_UPPER_STANDARD[
        literature_merged_data$ENVIRON_COMPARTMENT_SUB == "Freshwater"
      ],
      na.rm = TRUE
    ) |> round(),
    clip = "off"  # allows threshold labels to extend beyond plot area if needed
  )
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_vline()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_text()`).
Copper concentrations in water column samples by reference

Salt and Transitional Water

@fig-boxplot-saltwater

In [19]:
Code
# Calculate sample sizes
chosen_compartments <- c("Marine/Salt Water", "Brackish/Transitional Water")

sample_sizes_water <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  calculate_sample_sizes()

# Get first facet for labels
first_facet <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  distinct(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE) |>
  slice(1)

# Create plot
literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  ggplot(aes(
    x = MEASURED_OR_IMPUTED_VALUE_STANDARD,
    y = REFERENCE_ID,
    colour = ENVIRON_COMPARTMENT_SUB
  )) +
  # ... (all your other geoms remain the same) ...
  geom_crossbar(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "95% Confidence Interval"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    )
  ) +
  geom_linerange(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    linewidth = 0.8
  ) +
  geom_point(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    size = 2
  ) +
  geom_boxplot(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "1st-3rd Quartile"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      lower = UNCERTAINTY_LOWER_STANDARD,
      middle = MEASURED_OR_IMPUTED_VALUE_STANDARD,
      upper = UNCERTAINTY_UPPER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    stat = "identity"
  ) +
  geom_point(
    alpha = 0.5,
    data = \(x) {
      filter(
        x,
        is_not_reported(UNCERTAINTY_TYPE) | is_not_relevant(UNCERTAINTY_TYPE)
      )
    }
  ) +
  geom_text(
    data = sample_sizes_water,
    aes(x = -Inf, y = REFERENCE_ID, label = paste0("n=", total_n)),
    hjust = -1, vjust = -1,
    inherit.aes = FALSE,
    size = 3
  ) +
  geom_vline(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps, THRESHOLD_TYPE == "Classification boundary")
    },
    aes(xintercept = THRESHOLD_VALUE, color = THRESHOLD_CLASS),
    linetype = "dashed",
    linewidth = 0.8,
    alpha = 0.6
  ) +
  # Threshold labels - only in first facet
  geom_text(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps) |>
        cross_join(first_facet)
    },
    aes(x = THRESHOLD_VALUE, y = Inf, label = THRESHOLD_CLASS, color = THRESHOLD_CLASS),
    vjust = 1.5,
    angle = 90,
    hjust = 1,
    size = 3,
    inherit.aes = FALSE
  ) +
  scale_color_manual(
    values = c(threshold_colours, compartment_colours),
    breaks = names(compartment_colours)
  ) +
  labs(
    x = "Concentration (mg/L)",
    y = "Reference",
    colour = "Sub-compartment"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.box = "vertical"
  ) +
  facet_nested(
    rows = vars(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE),
    scales = "free_y",
    space = "free_y",
    shrink = TRUE,
    labeller = label_wrap_gen(width = 15),
    nest_line = element_line()
  ) +
      coord_cartesian(
    xlim = range(
      literature_merged_data$UNCERTAINTY_UPPER_STANDARD[
        literature_merged_data$ENVIRON_COMPARTMENT_SUB %in% chosen_compartments
      ],
      na.rm = TRUE
    ) |> round(),
    clip = "off"  # allows threshold labels to extend beyond plot area if needed
  )
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf
Copper concentrations in water column samples by reference

Aquatic Sediment

@fig-boxplot-sediment

In [20]:
Code
# Calculate sample sizes
chosen_compartments <- c("Aquatic Sediment")

sample_sizes_water <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  calculate_sample_sizes()

# Get first facet for labels
first_facet <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  distinct(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE) |>
  slice(1)

# Create plot
literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  ggplot(aes(
    x = MEASURED_OR_IMPUTED_VALUE_STANDARD,
    y = REFERENCE_ID,
    colour = ENVIRON_COMPARTMENT_SUB
  )) +
  # ... (all your other geoms remain the same) ...
  geom_crossbar(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "95% Confidence Interval"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    )
  ) +
  geom_linerange(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    linewidth = 0.8
  ) +
  geom_point(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    size = 2
  ) +
  geom_boxplot(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "1st-3rd Quartile"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      lower = UNCERTAINTY_LOWER_STANDARD,
      middle = MEASURED_OR_IMPUTED_VALUE_STANDARD,
      upper = UNCERTAINTY_UPPER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    stat = "identity"
  ) +
  geom_point(
    alpha = 0.5,
    data = \(x) {
      filter(
        x,
        is_not_reported(UNCERTAINTY_TYPE) | is_not_relevant(UNCERTAINTY_TYPE)
      )
    }
  ) +
  geom_text(
    data = sample_sizes_water,
    aes(x = -Inf, y = REFERENCE_ID, label = paste0("n=", total_n)),
    hjust = -1, vjust = -1,
    inherit.aes = FALSE,
    size = 3
  ) +
  geom_vline(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps, THRESHOLD_TYPE == "Classification boundary")
    },
    aes(xintercept = THRESHOLD_VALUE, color = THRESHOLD_CLASS),
    linetype = "dashed",
    linewidth = 0.8,
    alpha = 0.6
  ) +
  # Threshold labels - only in first facet
  geom_text(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps) |>
        cross_join(first_facet)
    },
    aes(x = THRESHOLD_VALUE, y = Inf, label = THRESHOLD_CLASS, color = THRESHOLD_CLASS),
    vjust = 1.5,
    angle = 90,
    hjust = 1,
    size = 3,
    inherit.aes = FALSE
  ) +
  scale_color_manual(
    values = c(threshold_colours, compartment_colours),
    breaks = names(compartment_colours)
  ) +
  labs(
    x = "Concentration (mg/L)",
    y = "Reference",
    colour = "Sub-compartment"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.box = "vertical"
  ) +
  facet_nested(
    rows = vars(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE),
    scales = "free_y",
    space = "free_y",
    shrink = TRUE,
    labeller = label_wrap_gen(width = 15),
    nest_line = element_line()
  ) +
      coord_cartesian(
    xlim = range(
      literature_merged_data$UNCERTAINTY_UPPER_STANDARD[
        literature_merged_data$ENVIRON_COMPARTMENT_SUB %in% chosen_compartments
      ],
      na.rm = TRUE
    ) |> round(),
    clip = "off"  # allows threshold labels to extend beyond plot area if needed
  )
Copper concentrations in aquatic sediment samples by reference

Terrestrial Compartment

Currently, data on copper in the terrestrial compartment is limited to one study (@fig-boxplot-terrestrial).

In [21]:
Code
# Calculate sample sizes
chosen_compartments <- c("Soil O Horizon (Organic)")

sample_sizes_water <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  calculate_sample_sizes()

# Get first facet for labels
first_facet <- literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  distinct(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE) |>
  slice(1)

# Create plot
literature_merged_data |>
  filter(
    ENVIRON_COMPARTMENT_SUB %in% chosen_compartments,
    !is.na(MEASURED_OR_IMPUTED_VALUE_STANDARD)
  ) |>
  ggplot(aes(
    x = MEASURED_OR_IMPUTED_VALUE_STANDARD,
    y = REFERENCE_ID,
    colour = ENVIRON_COMPARTMENT_SUB
  )) +
  # ... (all your other geoms remain the same) ...
  geom_crossbar(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "95% Confidence Interval"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    )
  ) +
  geom_linerange(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    linewidth = 0.8
  ) +
  geom_point(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "Standard Deviation"),
    size = 2
  ) +
  geom_boxplot(
    data = \(x) filter(x, UNCERTAINTY_TYPE == "1st-3rd Quartile"),
    aes(
      xmin = UNCERTAINTY_LOWER_STANDARD,
      lower = UNCERTAINTY_LOWER_STANDARD,
      middle = MEASURED_OR_IMPUTED_VALUE_STANDARD,
      upper = UNCERTAINTY_UPPER_STANDARD,
      xmax = UNCERTAINTY_UPPER_STANDARD
    ),
    stat = "identity"
  ) +
  geom_point(
    alpha = 0.5,
    data = \(x) {
      filter(
        x,
        is_not_reported(UNCERTAINTY_TYPE) | is_not_relevant(UNCERTAINTY_TYPE)
      )
    }
  ) +
  geom_text(
    data = sample_sizes_water,
    aes(x = -Inf, y = REFERENCE_ID, label = paste0("n=", total_n)),
    hjust = -1, vjust = -1,
    inherit.aes = FALSE,
    size = 3
  ) +
  geom_vline(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps, THRESHOLD_TYPE == "Classification boundary")
    },
    aes(xintercept = THRESHOLD_VALUE, color = THRESHOLD_CLASS),
    linetype = "dashed",
    linewidth = 0.8,
    alpha = 0.6
  ) +
  # Threshold labels - only in first facet
  geom_text(
    data = \(x) {
      sub_comps <- unique(x$ENVIRON_COMPARTMENT_SUB)
      copper_toxicity_thresholds |>
        filter(ENVIRON_COMPARTMENT_SUB %in% sub_comps) |>
        cross_join(first_facet)
    },
    aes(x = THRESHOLD_VALUE, y = Inf, label = THRESHOLD_CLASS, color = THRESHOLD_CLASS),
    vjust = 1.5,
    angle = 90,
    hjust = 1,
    size = 3,
    inherit.aes = FALSE
  ) +
  scale_color_manual(
    values = c(threshold_colours, compartment_colours),
    breaks = names(compartment_colours)
  ) +
  labs(
    x = "Concentration (mg/L)",
    y = "Reference",
    colour = "Sub-compartment"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.box = "vertical"
  ) +
  facet_nested(
    rows = vars(OCEAN_COUNTRY, SITE_GEOGRAPHIC_FEATURE),
    scales = "free_y",
    space = "free_y",
    shrink = TRUE,
    labeller = label_wrap_gen(width = 15),
    nest_line = element_line()
  ) +
      coord_cartesian(
    xlim = range(
      literature_merged_data$UNCERTAINTY_UPPER_STANDARD[
        literature_merged_data$ENVIRON_COMPARTMENT_SUB %in% chosen_compartments
      ],
      na.rm = TRUE
    ) |> round(),
    clip = "off"  # allows threshold labels to extend beyond plot area if needed
  )
Copper concentrations in terrestrial samples by reference

Biota Compartment

Aquatic Biota

presents copper body burdens across different species groups. Note: There appear to be issues with the distribution calculations for biota that require further investigation.

In [22]:
Code

# TODO: Fix pls
# literature_merged_data |>
#   filter(
#     ENVIRON_COMPARTMENT == "Biota"
#   ) |>
#   ggplot(aes(
#     x = MEASURED_OR_IMPUTED_VALUE_STANDARD,
#     y = SAMPLE_SPECIES,
#     colour = SPECIES_GROUP
#   )) +
#   geom_pointrange(aes(
#     xmin = UNCERTAINTY_UPPER_STANDARD,
#     xmax = UNCERTAINTY_UPPER_STANDARD
#   )) +
#   facet_wrap(
#     facet = vars(OCEAN_COUNTRY),
#     ncol = 1,
#     scales = "free_y",
#     space = "free_y"
#   ) |> theme_minimal()

Network Diagram

In [25]:
Code

unique_compartments_and_sites <- literature_merged_data |>
  select(
    SITE_GEOGRAPHIC_FEATURE,
    SITE_GEOGRAPHIC_FEATURE_SUB,
    ENVIRON_COMPARTMENT,
    ENVIRON_COMPARTMENT_SUB,
    MEASURED_N
  ) |>
  distinct() |>
  filter(if_all(everything(), ~ !is.na(.))) |>
  group_by(SITE_GEOGRAPHIC_FEATURE, SITE_GEOGRAPHIC_FEATURE_SUB, ENVIRON_COMPARTMENT, ENVIRON_COMPARTMENT_SUB) |>
  reframe(sum_n = sum(MEASURED_N)) |>
  mutate(compartment_name_n = glue::glue("{ENVIRON_COMPARTMENT_SUB}\n{sum_n}"))


# TODO: This is very speculative and just generated by AI. I haven't yet reviewed it to see what's missing.
# Copper flow network edges ----
copper_flow_edges <- tribble(
  ~from, ~to, ~flow_type, ~plausibility, ~magnitude, ~evidence_quality, ~description,

  # Atmospheric deposition to surface waters ----
  "Atmospheric", "Freshwater", "deposition", 0.95, "medium", "high", "Wet and dry deposition from atmospheric particulates",
  "Atmospheric", "Brackish/Transitional Water", "deposition", 0.95, "medium", "high", "Wet and dry deposition to coastal/estuarine waters",
  "Atmospheric", "Aquatic Sediment", "deposition", 0.90, "low", "medium", "Direct deposition to exposed sediments",
  "Atmospheric", "Soil O Horizon (Organic)", "deposition", 0.95, "medium", "high", "Atmospheric deposition to terrestrial surfaces",

  # Terrestrial to aquatic flows ----
  "Soil O Horizon (Organic)", "Freshwater", "runoff", 0.95, "high", "high", "Surface runoff and leaching from soils",
  "Soil O Horizon (Organic)", "Groundwater", "percolation", 0.85, "medium", "medium", "Downward migration through soil profile",
  "Biota, Terrestrial", "Soil O Horizon (Organic)", "excretion_decomposition", 0.90, "low", "medium", "Excretion and decomposition of terrestrial organisms",

  # Groundwater to surface water ----
  "Groundwater", "Freshwater", "baseflow", 0.90, "medium", "high", "Groundwater discharge to streams/rivers",
  "Groundwater", "Brackish/Transitional Water", "discharge", 0.75, "low", "medium", "Submarine groundwater discharge to coastal waters",

  # Freshwater flows ----
  "Freshwater", "Brackish/Transitional Water", "advection", 0.95, "high", "high", "River discharge to estuaries",
  "Freshwater", "Aquatic Sediment", "sedimentation", 0.95, "high", "high", "Particulate settling from water column",
  "Freshwater", "Biota, Aquatic", "uptake", 0.95, "medium", "high", "Bioaccumulation by freshwater organisms",

  # Brackish/transitional water flows ----
  "Brackish/Transitional Water", "Aquatic Sediment", "sedimentation", 0.95, "high", "high", "Particulate settling in estuarine environments",
  "Brackish/Transitional Water", "Biota, Aquatic", "uptake", 0.95, "medium", "high", "Bioaccumulation by estuarine/coastal organisms",

  # Sediment processes ----
  "Aquatic Sediment", "Freshwater", "resuspension", 0.85, "medium", "high", "Sediment resuspension and porewater release",
  "Aquatic Sediment", "Brackish/Transitional Water", "resuspension", 0.85, "medium", "high", "Sediment resuspension in coastal/estuarine areas",
  "Aquatic Sediment", "Biota, Aquatic", "uptake", 0.90, "low", "medium", "Benthic organism uptake from sediments",
  "Aquatic Sediment", "Groundwater", "diagenesis", 0.70, "low", "low", "Deep burial and diagenetic processes",

  # Biota cycling ----
  "Biota, Aquatic", "Aquatic Sediment", "excretion_decomposition", 0.95, "medium", "high", "Fecal pellets, mortality, decomposition",
  "Biota, Aquatic", "Freshwater", "excretion", 0.90, "low", "medium", "Dissolved excretion by aquatic organisms",
  "Biota, Aquatic", "Brackish/Transitional Water", "excretion", 0.90, "low", "medium", "Dissolved excretion in coastal waters",
  "Biota, Aquatic", "Biota, Terrestrial", "trophic_transfer", 0.80, "low", "medium", "Predation by terrestrial organisms (e.g., birds, mammals)",

  # Wastewater flows ----
  "Wastewater", "Freshwater", "discharge", 0.90, "high", "high", "Treated or untreated wastewater discharge",
  "Wastewater", "Brackish/Transitional Water", "discharge", 0.80, "medium", "medium", "Coastal wastewater discharge",
  "Wastewater", "Sludge", "treatment", 0.95, "high", "high", "Copper partitioning to sludge during treatment",

  # Sludge applications ----
  "Sludge", "Soil O Horizon (Organic)", "application", 0.85, "medium", "medium", "Agricultural application of biosolids",
  "Sludge", "Freshwater", "discharge", 0.60, "low", "low", "Illegal or emergency discharge (historical)",

  # Terrestrial-atmospheric ----
  "Soil O Horizon (Organic)", "Atmospheric", "resuspension", 0.70, "low", "low", "Wind erosion and dust generation",
  "Biota, Terrestrial", "Atmospheric", "volatilization", 0.40, "negligible", "low", "Unlikely for copper but included for completeness"
)

# Add reverse flow indicators for bidirectional processes if needed ----
# don't work yet
# copper_flow_edges |> left_join(unique_compartments_and_sites |> select(ENVIRON_COMPARTMENT_SUB, compartment_name_n), by = "ENVIRON_COMPARTMENT_SUB")

# Summary statistics ----
# copper_flow_edges |>
#   count(flow_type, sort = TRUE)

# copper_flow_edges |>
#   count(magnitude)

# Then plot
graph <- as_tbl_graph(copper_flow_edges)
In [26]:
Code

ggraph(graph, layout = "tree") +
    geom_edge_diagonal2(aes(label = flow_type) , colour = "darkgrey", label_colour = "black",
  strength = 0.8,
  check_overlap = TRUE,
  angle_calc = "along",
                  arrow = arrow(length = unit(4, 'mm')),
                start_cap = circle(10, 'mm'),
                end_cap = circle(10, 'mm')) +
  geom_node_point(size = 20, aes(colour = name))+
  geom_node_text(aes(label = str_wrap(name, 10)), size = 3,) +
  scale_fill_manual(values = compartment_colours) +
  theme_void() +
  theme(legend.position = "none")
Warning: No shared levels found between `names(values)` of the manual scale and the
data's fill values.

Appendices

Appendix 1: Copper Sources Data

Submarine Tailing Disposal in/off Norway

Note: STD locations (named) extracted from Kvassnes & Iversen 2013. Coordinates estimated by LLM, confidence self-reported by LLM.

In [27]:
Code
mine_tailings_coords() |> DT::datatable()
In [28]:

Apple Production Areas in Norway

According to https://static02.nmbu.no/mina/studier/moppgaver/2018-Paulsen.pdf, 7 municipalities represent the main fruit districts in Norway. We also have sales of copper oxide for 2020-24 from https://mattilsynet-xp7prod.enonic.cloud/_/attachment/inline/8311cd70-8877-4712-98cc-f7bc1dba46ac:0f491609e72f1d5f0dca5ad1030a4b979e8b4695/Omsetningsstatistikk%20for%20plantevernmidler%202020%20-%202024%20med%20uthevinger.pdf (table 2a, p4, thanks to Roger Holten).

N.B: Re Ronny Berdinsen, Mattilsynet (2025.12.15)

Hei

Tallene ble korrigerte pga en feilrapportering i 2022. Riktige tall er:

  • 2022: 4332 liter (9482liter - 5150 liter)
  • 2023: 2400 liter

Merk også at statistikken ikke viser plantevernmidler brukt pr år, men importert mengde. Det er ikke gitt at det som er importert i eksempelvis 2022 er brukt av bonden samme år.

In [29]:
Code
number_of_sites <- nrow(municipality_centroids())
sales_data_split <- copper_oxide_sales() |> mutate(amount_kg = amount_kg / number_of_sites,
average_kg = average_kg / number_of_sites)

cross_join(municipality_centroids(), sales_data_split) |>
   DT::datatable()
In [30]:

Although not explicitly stated, we assume that:

  • Copper products used as fungicides are exclusively applied in these areas and enter the environment via soil and freshwater
  • Apple farmers use the arthimetic mean of sold copper oxide for 2020 - 2024 every year. all applied copper oxide enters the environment.
  • The total used copper is split equally between the “main fruit districts” named in Paulsen (2018).

Manufacturing Emissions of Copper in Norway

Taken from https://www.norskeutslipp.no/no/Komponenter/Utslipp/Kobber/?ComponentType=utslipp&ComponentPageID=73&SectorID=600, which includes self-reported copper emissions by land-based Norwegian industries for 1985-2024. Date is very spotty and incomplete, but it allows us to highlight fylke with higher emissions. Map to follow.

We assume:

  • Reported concentrations are fully correct
  • Everything ends up in the environment.
In [31]:
Code

source("temp_R/scratchpad_norske_utslippe.R")

top_municipalities_plot